Literary authorship attribution with phrase-structure fragments
نویسنده
چکیده
We present a method of authorship attribution and stylometry that exploits hierarchical information in phrase-structures. Contrary to much previous work in stylometry, we focus on content words rather than function words. Texts are parsed to obtain phrase-structures, and compared with texts to be analyzed. An efficient tree kernel method identifies common tree fragments among data of known authors and unknown texts. These fragments are then used to identify authors and characterize their styles. Our experiments show that the structural information from fragments provides complementary information to the baseline trigram model.
منابع مشابه
Towards a better understanding of Burrows's Delta in literary authorship attribution
Burrows’s Delta is the most established measure for stylometric difference in literary authorship attribution. Several improvements on the original Delta have been proposed. However, a recent empirical study showed that none of the proposed variants constitute a major improvement in terms of authorship attribution performance. With this paper, we try to improve our understanding of how and why ...
متن کاملAuthorship Attribution in Bengali Language
We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accu...
متن کاملThe Computational-Linguistic Approach to Forensic Authorship Attribution
This article examines the diversity of methods in authorship attribution through a lens which focuses attention on a single common element. The current state of authorship attribution study is spread throughout so many academic and non -academic disciplines that it is nigh impossible to describe all of the various assumptions about language and authorship. The disciplines involved in authorship...
متن کاملExplaining Delta, or: How do distance measures for authorship attribution work?
Authorship Attribution is a research area in quantitative text analysis concerned with attributing texts of unknown or disputed authorship to their actual author based on quantitatively measured linguistic evidence (see Juola 2006; Stamatatos 2009; Koppel et al. 2009). Authorship attribution has applications in literary studies, history, forensics and many other fields, e.g. corpus stylistics (...
متن کاملAuthorship Attribution and Author Profiling of Lithuanian Literary Texts
In this work we are solving authorship attribution and author profiling tasks (by focusing on the age and gender dimensions) for the Lithuanian language. This paper reports the first results on literary texts, which we compared to the results, previously obtained with different functional styles and language types (i.e., parliamentary transcripts and forum posts). Using the Naïve Bayes Multinom...
متن کامل